Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
As new laws governing management of personal data are introduced, e.g., the European Union’s General Data Protection Regulation of 2016 and the California Consumer Privacy Act of 2018, compliance with data governance legislation is becoming an increasingly important aspect of data management. An important component of many data privacy laws is that they require companies to only use an individual’s data for a purpose the individual has explicitly consented to. Prior methods for enforcing consent for aggregate queries either use access control to eliminate data without consent from query evaluation or apply differential privacy algorithms to inject synthetic noise into the outcomes of queries (or input data) to ensure that the anonymity of non-consenting individuals is preserved with high probability. Both approaches return query results that differ from the ground truth results corresponding to the full input containing data from both consenting and non-consenting individuals. We present an alternative frame- work for group-by aggregate queries, tailored for applications, e.g., medicine, where even a small deviation from the correct answer to a query cannot be tolerated. Our approach uses provenance to determine, for each output tuple of a group-by aggregate query, which individual’s data was used to derive the result for this group. We then use statistical tests to determine how likely it is that the presence of data for a non-consenting individual will be revealed by such an output tuple. We filter out tuples for which this test fails, i.e., which are deemed likely to reveal non-consenting data. Thus, our approach always returns a subset of the ground truth query answers. Our experiments successfully return only 100% accurate results in instances where access control or differential privacy would have either returned less total or less accurate results.more » « less
-
Data privacy policy requirements are a quickly evolving part of the data management domain. Healthcare (e.g., HIPAA), financial (e.g., GLBA), and general laws such as GDPR or CCPA impose controls on how personal data should be managed. Relational databases do not offer built-in features to support data management features to comply with such laws. As a result, many organizations implement ad-hoc solutions or use third party tools to ensure compliance with privacy policies. However, external compliance framework can conflict with the internal activity in a database (e.g., trigger side-effects or aborted transactions). In our prior work, we introduced a framework that integrates data retention and data purging compliance into the database itself, requiring only the support for triggers and encryption, which are already available in any mainstream database engine. In this demonstration paper, we introduce DBCompliant – a tool that demonstrates how our approach can seamlessly integrate comprehensive policy compliance (defined via SQL queries). Although we use PostgreSQL as our back-end, DBCompliant could be adapted to any other relational database. Finally, our approach imposes low (less than 5%) user query overhead.more » « less
-
In file systems and database management systems (DBMSes), deleting data marks it as unallocated storage rather than explicitly erasing data. This data can be reconstructed from raw storage, making it vulnerable to data theft and exposing organizations to liability and compliance risks, violating data retention and destruction policies. The problem is further magnified in DBMSes because (unlike in file systems) DBMS backups are performed in pages and will include such deleted records. Data erasure (or sanitization) is a process that eliminates this vulnerability, providing users with “the right to be forgotten”. However, most of the work in data sanitization is only relevant to erasing data at the file system level, and not in DBMSes. Limited existing work in database sanitization takes an erase-on-commit approach, which can introduce significant I/O bottlenecks. In this paper, we describe a novel data sanitization method, DBSanitizer, that 1) is DBMS agnostic, 2) can batch value erasure, and 3) targets specific data to erase. DBSanitizer is designed as a template for DBMS vendors to support backup sanitization and ensure that no undesirable data is retained in backups. In this paper, we demonstrate how our approach can be used in any row-store relational DBMS (including Oracle, PostgreSQL, MySQL, and SQLite). As there are no backup sanitization tools available on the market or in research literature, we evaluate DBSanitizer, in a live database that supports erase-on-commit sanitization approach.more » « less
-
From the United States’ Health Insurance Portability and Accountability Act (HIPAA) to the European Union’s General Data Protection Regulation (GDPR), there has been an increased focus on individual data privacy protection. Because multiple enforcement agencies (such as legal entities and external governing bodies) have jurisdiction over data governance, it is possible for the same data value to be subject to multiple (and potentially conflicting) policies. As a result, managing and enforcing all applicable legal requirements has become a complex task. In this paper, we present a comprehensive overview of the steps to integrating data retention and purging into a database management system (DBMS). We describe the changes necessary at each step of the data lifecycle management, the minimum functionality that any DBMS (relational or NoSQL) must support, and the guarantees provided by this system. Our proposed solution is 1) completely transparent from the perspective of the DBMS user; 2) requires only a minimal amount of tuning by the database administrator; 3) imposes a negligible performance overhead and a modest storage overhead; and 4) automates the enforcement of both retention and purging policies in the database.more » « less
-
Cyberattacks continue to evolve and adapt to state-of-the-art security mechanisms. Therefore, it is critical for security experts to routinely inspect audit logs to detect complex security breaches. However, if a system was compromised during a cyberattack, the validity of the audit logs themselves cannot necessarily be trusted. Specifically, for a database management system (DBMS), an attacker with elevated privileges may temporarily disable the audit logs, bypassing logging altogether along with any tamper-proof logging mechanisms. Thus, security experts need techniques to validate logs independent of a potentially compromised system to detect security breaches. This paper demonstrates that SQL query operations produce a repeatable set of patterns within DBMS process memory. Operations such as full table scans, index accesses, or joins each produce their own set of distinct forensic artifacts in memory. Given these known patterns, we propose that collecting forensic artifacts from a trusted memory snapshot allows one to reverse-engineer query activity and validate audit logs independent of the DBMS itself and outside the scope of a database administrator's privileges. We rely on the fact the queries must ultimately be processed in memory regardless of any security mechanisms they may have bypassed. This work is generalized to all relational DBMSes by using two representative DBMSes, Oracle and MySQL.more » « less
-
Data privacy requirements are a complex and quickly evolving part of the data management domain. Especially in Healthcare (e.g., United States Health Insurance Portability and Accountability Act and Veterans Affairs requirements), there has been a strong emphasis on data privacy and protection. Data storage is governed by multiple sources of policy requirements, including internal policies and legal requirements imposed by external governing organizations. Within a database, a single value can be subject to multiple requirements on how long it must be preserved and when it must be irrecoverably destroyed. This often results in a complex set of overlapping and potentially conflicting policies. Existing storage systems are lacking sufficient support functionality for these critical and evolving rules, making compliance an underdeveloped aspect of data management. As a result, many organizations must implement manual ad-hoc solutions to ensure compliance. As long as organizations depend on manual approaches, there is an increased risk of non-compliance and threat to customer data privacy. In this paper, we detail and implement an automated comprehensive data management compliance framework facilitating retention and purging compliance within a database management system. This framework can be integrated into existing databases without requiring changes to existing business processes. Our proposed implementation uses SQL to set policies and automate compliance. We validate this framework on a Postgres database, and measure the factors that contribute to our reasonable performance overhead (13% in a simulated real-world workload).more » « less
-
Data compliance laws establish rules intended to protect privacy. These define both retention durations (how long data must be kept) and purging deadlines (when the data must be destroyed in storage). To comply with the laws and to minimize liability, companies must destroy data that must be purged or is no longer needed. However, database backups generally cannot be edited to purge ``expired'' data and erasing the entire backup is impractical. To maintain compliance, data curators need a mechanism to support targeted destruction of data in backups. In this paper, we present a cryptographic erasure framework that can purge data from across database backups. We demonstrate how different purge policies can be defined through views and enforced without violating database constraints.more » « less
-
Data retention laws establish rules intended to protect privacy. These define both retention durations (how long data must be kept) and purging deadlines (when the data must be destroyed in storage). To comply with the laws and to minimize liability, companies should destroy data that must be purged or is no longer needed. However, database backups generally cannot be edited to purge “expired” data and erasing the entire backup is impractical. To maintain compliance, data curators need a mechanism to support targeted destruction of data in backups. In this paper, we present a cryptographic erasure framework that can purge data from all database backups. Our approach can be transparently integrated into existing database backup processes. We demonstrate how different purge policies can be defined through views and enforced by triggers without violating database constraints.more » « less
-
Compliance with data retention laws and legislation is an important aspect of data management. As new laws governing personal data management are introduced (e.g., California Consumer Privacy Act enacted in 2020) and a greater emphasis is placed on enforcing data privacy law compliance, data retention support must be an inherent part of data management systems. However, relational databases do not currently offer functionality to enforce retention compliance. In this paper, we propose a framework that integrates data retention support into any relational database. Using SQL-based mechanisms, our system supports an intuitive definition of data retention policies. We demonstrate that our approach meets the legal requirements of retention and can be implemented to transparently guarantee compliance. Our framework streamlines compliance support without requiring database schema changes, while incurring an average 6.7% overhead compared to the current state-of-the-art solution.more » « less
-
Best practices in data management and privacy mandate that old data must be irreversibly destroyed. However, due to performance optimization reasons, old (deleted or updated) data is not immediately purged from active database storage. Database backups that typically work by backing up table and index pages (rather than logical rows) greatly exacerbate the privacy problem of the old surviving data. Copying such deleted data into backups ensures that unknown quantities of old data can be stored indefinitely. In this paper, we quantify the amount of deleted data retained in backups by four major representative databases, comparing the default behavior versus an explicit defrag operation. We review the defrag options available in these databases and discuss the impact they have on eliminating old data from backups. We demonstrate that each database has a defrag mechanism that can eliminate most of old deleted data (although in Oracle pre-update content may survive defrag). Finally, we outline the factors that organizations should consider when deciding whether to apply defrag prior to executing their backups.more » « less
An official website of the United States government

Full Text Available